Overview

Dataset statistics

Number of variables34
Number of observations100000
Missing cells100031
Missing cells (%)2.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory26.7 MiB
Average record size in memory280.0 B

Variable types

Numeric4
Categorical23
Text4
DateTime2
Unsupported1

Dataset

DescriptionThis profiling report was generated for End of Module Project for UoL
URL

Alerts

user_id is highly overall correlated with sex and 1 other fieldsHigh correlation
movie_id is highly overall correlated with unknown and 19 other fieldsHigh correlation
unknown is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Action is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Adventure is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Animation is highly overall correlated with movie_id and 2 other fieldsHigh correlation
Children's is highly overall correlated with movie_id and 2 other fieldsHigh correlation
Comedy is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Crime is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Documentary is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Drama is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Fantasy is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Film-Noir is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Horror is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Musical is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Mystery is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Romance is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Sci-Fi is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Thriller is highly overall correlated with movie_id and 1 other fieldsHigh correlation
War is highly overall correlated with movie_id and 1 other fieldsHigh correlation
Western is highly overall correlated with movie_id and 1 other fieldsHigh correlation
genre is highly overall correlated with movie_id and 19 other fieldsHigh correlation
sex is highly overall correlated with user_idHigh correlation
occupation is highly overall correlated with user_idHigh correlation
unknown is highly imbalanced (99.9%)Imbalance
Animation is highly imbalanced (77.6%)Imbalance
Children's is highly imbalanced (62.7%)Imbalance
Crime is highly imbalanced (59.6%)Imbalance
Documentary is highly imbalanced (93.6%)Imbalance
Fantasy is highly imbalanced (89.7%)Imbalance
Film-Noir is highly imbalanced (87.4%)Imbalance
Horror is highly imbalanced (70.0%)Imbalance
Musical is highly imbalanced (71.6%)Imbalance
Mystery is highly imbalanced (70.3%)Imbalance
War is highly imbalanced (55.0%)Imbalance
Western is highly imbalanced (86.7%)Imbalance
video_release_date has 100000 (100.0%) missing valuesMissing
video_release_date is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-11-06 22:05:03.825065
Analysis finished2023-11-06 22:05:37.017695
Duration33.19 seconds
Software versionydata-profiling vv4.6.1
Download configurationconfig.json

Variables

user_id
Real number (ℝ)

HIGH CORRELATION 

Distinct943
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean461.48475
Minimum0
Maximum942
Zeros272
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-11-06T22:05:37.062006image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile45
Q1253
median446
Q3681
95-th percentile891
Maximum942
Range942
Interquartile range (IQR)428

Descriptive statistics

Standard deviation266.61442
Coefficient of variation (CV)0.57773181
Kurtosis-1.0973667
Mean461.48475
Median Absolute Deviation (MAD)213
Skewness0.082533291
Sum46148475
Variance71083.249
MonotonicityNot monotonic
2023-11-06T22:05:37.127057image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
404 737
 
0.7%
654 685
 
0.7%
12 636
 
0.6%
449 540
 
0.5%
275 518
 
0.5%
415 493
 
0.5%
536 490
 
0.5%
302 484
 
0.5%
233 480
 
0.5%
392 448
 
0.4%
Other values (933) 94489
94.5%
ValueCountFrequency (%)
0 272
0.3%
1 62
 
0.1%
2 54
 
0.1%
3 24
 
< 0.1%
4 175
0.2%
5 211
0.2%
6 403
0.4%
7 59
 
0.1%
8 22
 
< 0.1%
9 184
0.2%
ValueCountFrequency (%)
942 168
0.2%
941 79
0.1%
940 22
 
< 0.1%
939 107
0.1%
938 49
 
< 0.1%
937 108
0.1%
936 40
 
< 0.1%
935 142
0.1%
934 39
 
< 0.1%
933 174
0.2%

movie_id
Real number (ℝ)

HIGH CORRELATION 

Distinct1682
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean424.53013
Minimum0
Maximum1681
Zeros452
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-11-06T22:05:37.185254image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile29
Q1174
median321
Q3630
95-th percentile1073
Maximum1681
Range1681
Interquartile range (IQR)456

Descriptive statistics

Standard deviation330.79836
Coefficient of variation (CV)0.77921055
Kurtosis0.42253411
Mean424.53013
Median Absolute Deviation (MAD)196
Skewness0.9863565
Sum42453013
Variance109427.55
MonotonicityNot monotonic
2023-11-06T22:05:37.245502image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
49 583
 
0.6%
257 509
 
0.5%
99 508
 
0.5%
180 507
 
0.5%
293 485
 
0.5%
285 481
 
0.5%
287 478
 
0.5%
0 452
 
0.5%
299 431
 
0.4%
120 429
 
0.4%
Other values (1672) 95137
95.1%
ValueCountFrequency (%)
0 452
0.5%
1 131
 
0.1%
2 90
 
0.1%
3 209
0.2%
4 86
 
0.1%
5 26
 
< 0.1%
6 392
0.4%
7 219
0.2%
8 299
0.3%
9 89
 
0.1%
ValueCountFrequency (%)
1681 1
< 0.1%
1680 1
< 0.1%
1679 1
< 0.1%
1678 1
< 0.1%
1677 1
< 0.1%
1676 1
< 0.1%
1675 1
< 0.1%
1674 1
< 0.1%
1673 1
< 0.1%
1672 1
< 0.1%

rating
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
4.0
34174 
3.0
27145 
5.0
21201 
2.0
11370 
1.0
6110 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters300000
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3.0
2nd row2.0
3rd row4.0
4th row4.0
5th row4.0

Common Values

ValueCountFrequency (%)
4.0 34174
34.2%
3.0 27145
27.1%
5.0 21201
21.2%
2.0 11370
 
11.4%
1.0 6110
 
6.1%

Length

2023-11-06T22:05:37.299771image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:37.341932image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
4.0 34174
34.2%
3.0 27145
27.1%
5.0 21201
21.2%
2.0 11370
 
11.4%
1.0 6110
 
6.1%

Most occurring characters

ValueCountFrequency (%)
. 100000
33.3%
0 100000
33.3%
4 34174
 
11.4%
3 27145
 
9.0%
5 21201
 
7.1%
2 11370
 
3.8%
1 6110
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 200000
66.7%
Other Punctuation 100000
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 100000
50.0%
4 34174
 
17.1%
3 27145
 
13.6%
5 21201
 
10.6%
2 11370
 
5.7%
1 6110
 
3.1%
Other Punctuation
ValueCountFrequency (%)
. 100000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 300000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 100000
33.3%
0 100000
33.3%
4 34174
 
11.4%
3 27145
 
9.0%
5 21201
 
7.1%
2 11370
 
3.8%
1 6110
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 300000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 100000
33.3%
0 100000
33.3%
4 34174
 
11.4%
3 27145
 
9.0%
5 21201
 
7.1%
2 11370
 
3.8%
1 6110
 
2.0%

unix_timestamp
Real number (ℝ)

Distinct49282
Distinct (%)49.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.8352885 × 108
Minimum8.7472471 × 108
Maximum8.9328664 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-11-06T22:05:37.397605image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum8.7472471 × 108
5-th percentile8.7532031 × 108
Q18.7944871 × 108
median8.8282694 × 108
Q38.8825998 × 108
95-th percentile8.9171789 × 108
Maximum8.9328664 × 108
Range18561928
Interquartile range (IQR)8811274.5

Descriptive statistics

Standard deviation5343856.2
Coefficient of variation (CV)0.0060483098
Kurtosis-1.1687487
Mean8.8352885 × 108
Median Absolute Deviation (MAD)3886481
Skewness0.1738863
Sum8.8352885 × 1013
Variance2.8556799 × 1013
MonotonicityNot monotonic
2023-11-06T22:05:37.460058image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
891033606 12
 
< 0.1%
878962496 10
 
< 0.1%
881107817 10
 
< 0.1%
888637768 10
 
< 0.1%
876896210 10
 
< 0.1%
891331825 10
 
< 0.1%
889665232 10
 
< 0.1%
891034835 10
 
< 0.1%
884902317 10
 
< 0.1%
884901497 10
 
< 0.1%
Other values (49272) 99898
99.9%
ValueCountFrequency (%)
874724710 1
< 0.1%
874724727 1
< 0.1%
874724754 1
< 0.1%
874724781 1
< 0.1%
874724843 1
< 0.1%
874724882 2
< 0.1%
874724905 1
< 0.1%
874724937 1
< 0.1%
874724988 1
< 0.1%
874725081 1
< 0.1%
ValueCountFrequency (%)
893286638 7
< 0.1%
893286637 3
< 0.1%
893286603 1
 
< 0.1%
893286584 1
 
< 0.1%
893286550 3
< 0.1%
893286511 2
 
< 0.1%
893286502 1
 
< 0.1%
893286501 3
< 0.1%
893286491 1
 
< 0.1%
893286373 1
 
< 0.1%

title
Text

Distinct1664
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
2023-11-06T22:05:37.598280image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length81
Median length61
Mean length22.78191
Min length7

Characters and Unicode

Total characters2278191
Distinct characters79
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique134 ?
Unique (%)0.1%

Sample

1st rowKolya (1996)
2nd rowMen in Black (1997)
3rd rowTruth About Cats & Dogs, The (1996)
4th rowBirdcage, The (1996)
5th rowAdventures of Priscilla, Queen of the Desert, The (1994)
ValueCountFrequency (%)
the 32193
 
8.5%
1996 18745
 
5.0%
1997 15384
 
4.1%
1995 12408
 
3.3%
1994 9034
 
2.4%
of 7065
 
1.9%
1993 6671
 
1.8%
and 4828
 
1.3%
a 4087
 
1.1%
in 3359
 
0.9%
Other values (2470) 264486
69.9%
2023-11-06T22:05:37.827118image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
278275
 
12.2%
9 174495
 
7.7%
e 159599
 
7.0%
1 106420
 
4.7%
( 101985
 
4.5%
) 101985
 
4.5%
a 101802
 
4.5%
n 88173
 
3.9%
r 87855
 
3.9%
o 86945
 
3.8%
Other values (69) 990657
43.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1101108
48.3%
Decimal Number 407113
 
17.9%
Space Separator 278275
 
12.2%
Uppercase Letter 244851
 
10.7%
Open Punctuation 101985
 
4.5%
Close Punctuation 101985
 
4.5%
Other Punctuation 41985
 
1.8%
Dash Punctuation 889
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 159599
14.5%
a 101802
9.2%
n 88173
 
8.0%
r 87855
 
8.0%
o 86945
 
7.9%
i 82866
 
7.5%
t 78420
 
7.1%
s 61152
 
5.6%
h 58761
 
5.3%
l 50123
 
4.6%
Other values (19) 245412
22.3%
Uppercase Letter
ValueCountFrequency (%)
T 34916
14.3%
S 22002
 
9.0%
M 15819
 
6.5%
B 15613
 
6.4%
C 14509
 
5.9%
A 14018
 
5.7%
D 13999
 
5.7%
F 13517
 
5.5%
L 11139
 
4.5%
W 10810
 
4.4%
Other values (17) 78509
32.1%
Decimal Number
ValueCountFrequency (%)
9 174495
42.9%
1 106420
26.1%
6 25582
 
6.3%
7 25486
 
6.3%
5 18288
 
4.5%
8 16283
 
4.0%
4 15356
 
3.8%
3 11184
 
2.7%
2 7195
 
1.8%
0 6824
 
1.7%
Other Punctuation
ValueCountFrequency (%)
, 24238
57.7%
: 5024
 
12.0%
' 4877
 
11.6%
. 4808
 
11.5%
& 1220
 
2.9%
! 747
 
1.8%
* 627
 
1.5%
/ 381
 
0.9%
? 63
 
0.2%
Space Separator
ValueCountFrequency (%)
278275
100.0%
Open Punctuation
ValueCountFrequency (%)
( 101985
100.0%
Close Punctuation
ValueCountFrequency (%)
) 101985
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 889
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1345959
59.1%
Common 932232
40.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 159599
 
11.9%
a 101802
 
7.6%
n 88173
 
6.6%
r 87855
 
6.5%
o 86945
 
6.5%
i 82866
 
6.2%
t 78420
 
5.8%
s 61152
 
4.5%
h 58761
 
4.4%
l 50123
 
3.7%
Other values (46) 490263
36.4%
Common
ValueCountFrequency (%)
278275
29.9%
9 174495
18.7%
1 106420
 
11.4%
( 101985
 
10.9%
) 101985
 
10.9%
6 25582
 
2.7%
7 25486
 
2.7%
, 24238
 
2.6%
5 18288
 
2.0%
8 16283
 
1.7%
Other values (13) 59195
 
6.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2278110
> 99.9%
None 81
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
278275
 
12.2%
9 174495
 
7.7%
e 159599
 
7.0%
1 106420
 
4.7%
( 101985
 
4.5%
) 101985
 
4.5%
a 101802
 
4.5%
n 88173
 
3.9%
r 87855
 
3.9%
o 86945
 
3.8%
Other values (65) 990576
43.5%
None
ValueCountFrequency (%)
é 75
92.6%
è 4
 
4.9%
Á 1
 
1.2%
ö 1
 
1.2%
Distinct240
Distinct (%)0.2%
Missing9
Missing (%)< 0.1%
Memory size1.5 MiB
Minimum1922-01-01 00:00:00
Maximum1998-10-23 00:00:00
2023-11-06T22:05:37.908796image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:38.177360image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

video_release_date
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing100000
Missing (%)100.0%
Memory size1.5 MiB
Distinct1660
Distinct (%)1.7%
Missing13
Missing (%)< 0.1%
Memory size1.5 MiB
2023-11-06T22:05:38.376581image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length134
Median length98
Mean length60.15346
Min length36

Characters and Unicode

Total characters6014564
Distinct characters76
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique134 ?
Unique (%)0.1%

Sample

1st rowhttp://us.imdb.com/M/title-exact?Kolya%20(1996)
2nd rowhttp://us.imdb.com/M/title-exact?Men+in+Black+(1997)
3rd rowhttp://us.imdb.com/M/title-exact?Truth%20About%20Cats%20&%20Dogs,%20The%20(1996)
4th rowhttp://us.imdb.com/M/title-exact?Birdcage,%20The%20(1996)
5th rowhttp://us.imdb.com/M/title-exact?Adventures%20of%20Priscilla,%20Queen%20of%20the%20Desert,%20The%20(1994)
ValueCountFrequency (%)
http://us.imdb.com/m/title-exact?star%20wars%20(1977 583
 
0.6%
http://us.imdb.com/title?contact+(1997/i 509
 
0.5%
http://us.imdb.com/m/title-exact?fargo%20(1996 508
 
0.5%
http://us.imdb.com/m/title-exact?return%20of%20the%20jedi%20(1983 507
 
0.5%
http://us.imdb.com/title?liar+liar+(1997 485
 
0.5%
http://us.imdb.com/m/title-exact?english%20patient,%20the%20(1996 481
 
0.5%
http://us.imdb.com/m/title-exact?scream%20(1996 478
 
0.5%
http://us.imdb.com/m/title-exact?toy%20story%20(1995 452
 
0.5%
http://us.imdb.com/m/title-exact?air+force+one+(1997 431
 
0.4%
http://us.imdb.com/m/title-exact?independence%20day%20(1996 429
 
0.4%
Other values (1651) 95152
95.1%
2023-11-06T22:05:38.659512image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 573085
 
9.5%
/ 398144
 
6.6%
e 350899
 
5.8%
i 284092
 
4.7%
2 257527
 
4.3%
% 250211
 
4.2%
0 245929
 
4.1%
c 223080
 
3.7%
m 221714
 
3.7%
. 202953
 
3.4%
Other values (66) 3006930
50.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3361616
55.9%
Other Punctuation 1081180
 
18.0%
Decimal Number 904928
 
15.0%
Uppercase Letter 342634
 
5.7%
Dash Punctuation 101227
 
1.7%
Open Punctuation 95825
 
1.6%
Close Punctuation 95825
 
1.6%
Math Symbol 31301
 
0.5%
Space Separator 28
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 573085
17.0%
e 350899
10.4%
i 284092
 
8.5%
c 223080
 
6.6%
m 221714
 
6.6%
a 195679
 
5.8%
o 184357
 
5.5%
s 159368
 
4.7%
h 156021
 
4.6%
l 151145
 
4.5%
Other values (16) 862176
25.6%
Uppercase Letter
ValueCountFrequency (%)
M 111140
32.4%
T 36507
 
10.7%
S 21446
 
6.3%
C 16953
 
4.9%
B 15232
 
4.4%
A 14554
 
4.2%
F 13239
 
3.9%
D 12893
 
3.8%
L 10599
 
3.1%
P 10496
 
3.1%
Other values (16) 79575
23.2%
Decimal Number
ValueCountFrequency (%)
2 257527
28.5%
0 245929
27.2%
9 174360
19.3%
1 108125
11.9%
6 26434
 
2.9%
7 25359
 
2.8%
8 19692
 
2.2%
5 18823
 
2.1%
4 15426
 
1.7%
3 13253
 
1.5%
Other Punctuation
ValueCountFrequency (%)
/ 398144
36.8%
% 250211
23.1%
. 202953
18.8%
: 104429
 
9.7%
? 99999
 
9.2%
, 20627
 
1.9%
' 3472
 
0.3%
& 804
 
0.1%
! 541
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 101227
100.0%
Open Punctuation
ValueCountFrequency (%)
( 95825
100.0%
Close Punctuation
ValueCountFrequency (%)
) 95825
100.0%
Math Symbol
ValueCountFrequency (%)
+ 31301
100.0%
Space Separator
ValueCountFrequency (%)
28
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3704250
61.6%
Common 2310314
38.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 573085
15.5%
e 350899
 
9.5%
i 284092
 
7.7%
c 223080
 
6.0%
m 221714
 
6.0%
a 195679
 
5.3%
o 184357
 
5.0%
s 159368
 
4.3%
h 156021
 
4.2%
l 151145
 
4.1%
Other values (42) 1204810
32.5%
Common
ValueCountFrequency (%)
/ 398144
17.2%
2 257527
11.1%
% 250211
10.8%
0 245929
10.6%
. 202953
8.8%
9 174360
7.5%
1 108125
 
4.7%
: 104429
 
4.5%
- 101227
 
4.4%
? 99999
 
4.3%
Other values (14) 367410
15.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6014564
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 573085
 
9.5%
/ 398144
 
6.6%
e 350899
 
5.8%
i 284092
 
4.7%
2 257527
 
4.3%
% 250211
 
4.2%
0 245929
 
4.1%
c 223080
 
3.7%
m 221714
 
3.7%
. 202953
 
3.4%
Other values (66) 3006930
50.0%

unknown
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
99990 
1
 
10

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 99990
> 99.9%
1 10
 
< 0.1%

Length

2023-11-06T22:05:38.736183image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:38.775105image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 99990
> 99.9%
1 10
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 99990
> 99.9%
1 10
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 99990
> 99.9%
1 10
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 99990
> 99.9%
1 10
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 99990
> 99.9%
1 10
 
< 0.1%

Action
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
74411 
1
25589 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 74411
74.4%
1 25589
 
25.6%

Length

2023-11-06T22:05:38.816551image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:38.854352image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 74411
74.4%
1 25589
 
25.6%

Most occurring characters

ValueCountFrequency (%)
0 74411
74.4%
1 25589
 
25.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 74411
74.4%
1 25589
 
25.6%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 74411
74.4%
1 25589
 
25.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 74411
74.4%
1 25589
 
25.6%

Adventure
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
86247 
1
13753 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 86247
86.2%
1 13753
 
13.8%

Length

2023-11-06T22:05:38.895223image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:38.933777image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 86247
86.2%
1 13753
 
13.8%

Most occurring characters

ValueCountFrequency (%)
0 86247
86.2%
1 13753
 
13.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 86247
86.2%
1 13753
 
13.8%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 86247
86.2%
1 13753
 
13.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 86247
86.2%
1 13753
 
13.8%

Animation
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
96395 
1
 
3605

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 96395
96.4%
1 3605
 
3.6%

Length

2023-11-06T22:05:38.974719image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.012660image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 96395
96.4%
1 3605
 
3.6%

Most occurring characters

ValueCountFrequency (%)
0 96395
96.4%
1 3605
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 96395
96.4%
1 3605
 
3.6%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 96395
96.4%
1 3605
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 96395
96.4%
1 3605
 
3.6%

Children's
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
92818 
1
 
7182

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 92818
92.8%
1 7182
 
7.2%

Length

2023-11-06T22:05:39.053045image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.090832image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 92818
92.8%
1 7182
 
7.2%

Most occurring characters

ValueCountFrequency (%)
0 92818
92.8%
1 7182
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 92818
92.8%
1 7182
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 92818
92.8%
1 7182
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 92818
92.8%
1 7182
 
7.2%

Comedy
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
70168 
1
29832 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0 70168
70.2%
1 29832
29.8%

Length

2023-11-06T22:05:39.131914image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.169490image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 70168
70.2%
1 29832
29.8%

Most occurring characters

ValueCountFrequency (%)
0 70168
70.2%
1 29832
29.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 70168
70.2%
1 29832
29.8%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 70168
70.2%
1 29832
29.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 70168
70.2%
1 29832
29.8%

Crime
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
91945 
1
 
8055

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 91945
91.9%
1 8055
 
8.1%

Length

2023-11-06T22:05:39.211342image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.249112image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 91945
91.9%
1 8055
 
8.1%

Most occurring characters

ValueCountFrequency (%)
0 91945
91.9%
1 8055
 
8.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 91945
91.9%
1 8055
 
8.1%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 91945
91.9%
1 8055
 
8.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 91945
91.9%
1 8055
 
8.1%

Documentary
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
99242 
1
 
758

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 99242
99.2%
1 758
 
0.8%

Length

2023-11-06T22:05:39.289796image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.327673image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 99242
99.2%
1 758
 
0.8%

Most occurring characters

ValueCountFrequency (%)
0 99242
99.2%
1 758
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 99242
99.2%
1 758
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 99242
99.2%
1 758
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 99242
99.2%
1 758
 
0.8%

Drama
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
60105 
1
39895 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0 60105
60.1%
1 39895
39.9%

Length

2023-11-06T22:05:39.368229image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.405805image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 60105
60.1%
1 39895
39.9%

Most occurring characters

ValueCountFrequency (%)
0 60105
60.1%
1 39895
39.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 60105
60.1%
1 39895
39.9%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 60105
60.1%
1 39895
39.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 60105
60.1%
1 39895
39.9%

Fantasy
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
98648 
1
 
1352

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 98648
98.6%
1 1352
 
1.4%

Length

2023-11-06T22:05:39.448469image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.485483image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 98648
98.6%
1 1352
 
1.4%

Most occurring characters

ValueCountFrequency (%)
0 98648
98.6%
1 1352
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 98648
98.6%
1 1352
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 98648
98.6%
1 1352
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 98648
98.6%
1 1352
 
1.4%

Film-Noir
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
98267 
1
 
1733

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 98267
98.3%
1 1733
 
1.7%

Length

2023-11-06T22:05:39.526151image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.563480image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 98267
98.3%
1 1733
 
1.7%

Most occurring characters

ValueCountFrequency (%)
0 98267
98.3%
1 1733
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 98267
98.3%
1 1733
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 98267
98.3%
1 1733
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 98267
98.3%
1 1733
 
1.7%

Horror
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
94683 
1
 
5317

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 94683
94.7%
1 5317
 
5.3%

Length

2023-11-06T22:05:39.603407image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.641562image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 94683
94.7%
1 5317
 
5.3%

Most occurring characters

ValueCountFrequency (%)
0 94683
94.7%
1 5317
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 94683
94.7%
1 5317
 
5.3%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 94683
94.7%
1 5317
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 94683
94.7%
1 5317
 
5.3%

Musical
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
95046 
1
 
4954

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 95046
95.0%
1 4954
 
5.0%

Length

2023-11-06T22:05:39.681365image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.718965image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 95046
95.0%
1 4954
 
5.0%

Most occurring characters

ValueCountFrequency (%)
0 95046
95.0%
1 4954
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 95046
95.0%
1 4954
 
5.0%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 95046
95.0%
1 4954
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 95046
95.0%
1 4954
 
5.0%

Mystery
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
94755 
1
 
5245

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 94755
94.8%
1 5245
 
5.2%

Length

2023-11-06T22:05:39.759899image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.797223image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 94755
94.8%
1 5245
 
5.2%

Most occurring characters

ValueCountFrequency (%)
0 94755
94.8%
1 5245
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 94755
94.8%
1 5245
 
5.2%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 94755
94.8%
1 5245
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 94755
94.8%
1 5245
 
5.2%

Romance
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
80539 
1
19461 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 80539
80.5%
1 19461
 
19.5%

Length

2023-11-06T22:05:39.837073image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.875609image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 80539
80.5%
1 19461
 
19.5%

Most occurring characters

ValueCountFrequency (%)
0 80539
80.5%
1 19461
 
19.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 80539
80.5%
1 19461
 
19.5%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 80539
80.5%
1 19461
 
19.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 80539
80.5%
1 19461
 
19.5%

Sci-Fi
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
87270 
1
12730 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 87270
87.3%
1 12730
 
12.7%

Length

2023-11-06T22:05:39.916348image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:39.956202image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 87270
87.3%
1 12730
 
12.7%

Most occurring characters

ValueCountFrequency (%)
0 87270
87.3%
1 12730
 
12.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 87270
87.3%
1 12730
 
12.7%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 87270
87.3%
1 12730
 
12.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 87270
87.3%
1 12730
 
12.7%

Thriller
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
78128 
1
21872 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 78128
78.1%
1 21872
 
21.9%

Length

2023-11-06T22:05:39.999226image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:40.037560image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 78128
78.1%
1 21872
 
21.9%

Most occurring characters

ValueCountFrequency (%)
0 78128
78.1%
1 21872
 
21.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 78128
78.1%
1 21872
 
21.9%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 78128
78.1%
1 21872
 
21.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 78128
78.1%
1 21872
 
21.9%

War
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
90602 
1
9398 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 90602
90.6%
1 9398
 
9.4%

Length

2023-11-06T22:05:40.081055image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:40.118612image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 90602
90.6%
1 9398
 
9.4%

Most occurring characters

ValueCountFrequency (%)
0 90602
90.6%
1 9398
 
9.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 90602
90.6%
1 9398
 
9.4%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 90602
90.6%
1 9398
 
9.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 90602
90.6%
1 9398
 
9.4%

Western
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
0
98146 
1
 
1854

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 98146
98.1%
1 1854
 
1.9%

Length

2023-11-06T22:05:40.159968image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:40.200619image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 98146
98.1%
1 1854
 
1.9%

Most occurring characters

ValueCountFrequency (%)
0 98146
98.1%
1 1854
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 98146
98.1%
1 1854
 
1.9%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 98146
98.1%
1 1854
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 98146
98.1%
1 1854
 
1.9%

year
Date

Distinct71
Distinct (%)0.1%
Missing9
Missing (%)< 0.1%
Memory size1.5 MiB
Minimum1922-01-01 00:00:00
Maximum1998-01-01 00:00:00
2023-11-06T22:05:40.246823image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:40.308812image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

genre
Categorical

HIGH CORRELATION 

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Drama
23849 
Comedy
17935 
Thriller
12854 
Romance
8208 
Action
7371 
Other values (14)
29783 

Length

Max length11
Median length10
Mean length6.41247
Min length3

Characters and Unicode

Total characters641247
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowComedy
2nd rowAction
3rd rowComedy
4th rowComedy
5th rowComedy

Common Values

ValueCountFrequency (%)
Drama 23849
23.8%
Comedy 17935
17.9%
Thriller 12854
12.9%
Romance 8208
 
8.2%
Action 7371
 
7.4%
Adventure 5963
 
6.0%
Sci-Fi 3973
 
4.0%
Children's 3720
 
3.7%
War 3681
 
3.7%
Crime 2886
 
2.9%
Other values (9) 9560
9.6%

Length

2023-11-06T22:05:40.368355image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drama 23849
23.8%
comedy 17935
17.9%
thriller 12854
12.9%
romance 8208
 
8.2%
action 7371
 
7.4%
adventure 5963
 
6.0%
sci-fi 3973
 
4.0%
children's 3720
 
3.7%
war 3681
 
3.7%
crime 2886
 
2.9%
Other values (9) 9560
9.6%

Most occurring characters

ValueCountFrequency (%)
r 78358
12.2%
a 63340
 
9.9%
e 62504
 
9.7%
m 54886
 
8.6%
o 41028
 
6.4%
i 38695
 
6.0%
l 31515
 
4.9%
n 29049
 
4.5%
d 27618
 
4.3%
D 24605
 
3.8%
Other values (21) 189649
29.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 528245
82.4%
Uppercase Letter 104636
 
16.3%
Dash Punctuation 4646
 
0.7%
Other Punctuation 3720
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 78358
14.8%
a 63340
12.0%
e 62504
11.8%
m 54886
10.4%
o 41028
7.8%
i 38695
7.3%
l 31515
 
6.0%
n 29049
 
5.5%
d 27618
 
5.2%
y 22267
 
4.2%
Other values (8) 78985
15.0%
Uppercase Letter
ValueCountFrequency (%)
D 24605
23.5%
C 24541
23.5%
A 13913
13.3%
T 12854
12.3%
R 8208
 
7.8%
F 5148
 
4.9%
W 5022
 
4.8%
S 3973
 
3.8%
M 2951
 
2.8%
H 2748
 
2.6%
Dash Punctuation
ValueCountFrequency (%)
- 4646
100.0%
Other Punctuation
ValueCountFrequency (%)
' 3720
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 632881
98.7%
Common 8366
 
1.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 78358
12.4%
a 63340
 
10.0%
e 62504
 
9.9%
m 54886
 
8.7%
o 41028
 
6.5%
i 38695
 
6.1%
l 31515
 
5.0%
n 29049
 
4.6%
d 27618
 
4.4%
D 24605
 
3.9%
Other values (19) 181283
28.6%
Common
ValueCountFrequency (%)
- 4646
55.5%
' 3720
44.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 641247
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 78358
12.2%
a 63340
 
9.9%
e 62504
 
9.7%
m 54886
 
8.6%
o 41028
 
6.4%
i 38695
 
6.0%
l 31515
 
4.9%
n 29049
 
4.5%
d 27618
 
4.3%
D 24605
 
3.8%
Other values (21) 189649
29.6%
Distinct216
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
2023-11-06T22:05:40.469145image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length50
Median length40
Mean length14.78432
Min length3

Characters and Unicode

Total characters1478432
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowComedy
2nd rowAction-Adventure-Comedy-Sci-Fi
3rd rowComedy-Romance
4th rowComedy
5th rowComedy-Drama
ValueCountFrequency (%)
drama 13257
 
13.3%
comedy 9828
 
9.8%
comedy-romance 5055
 
5.1%
drama-romance 4767
 
4.8%
action-thriller 3550
 
3.5%
drama-thriller 2627
 
2.6%
comedy-drama 2422
 
2.4%
drama-war 2012
 
2.0%
action-adventure-sci-fi 1865
 
1.9%
horror 1558
 
1.6%
Other values (206) 53059
53.1%
2023-11-06T22:05:40.651099image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
r 147568
 
10.0%
- 127058
 
8.6%
e 123619
 
8.4%
a 120670
 
8.2%
i 103788
 
7.0%
m 103339
 
7.0%
o 91622
 
6.2%
n 77189
 
5.2%
c 63492
 
4.3%
l 57613
 
3.9%
Other values (21) 462474
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1117144
75.6%
Uppercase Letter 227048
 
15.4%
Dash Punctuation 127058
 
8.6%
Other Punctuation 7182
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 147568
13.2%
e 123619
11.1%
a 120670
10.8%
i 103788
9.3%
m 103339
9.3%
o 91622
8.2%
n 77189
6.9%
c 63492
 
5.7%
l 57613
 
5.2%
t 52156
 
4.7%
Other values (8) 176088
15.8%
Uppercase Letter
ValueCountFrequency (%)
C 45069
19.8%
A 42947
18.9%
D 40653
17.9%
T 21872
9.6%
R 19461
8.6%
F 15815
 
7.0%
S 12730
 
5.6%
W 11252
 
5.0%
M 10199
 
4.5%
H 5317
 
2.3%
Dash Punctuation
ValueCountFrequency (%)
- 127058
100.0%
Other Punctuation
ValueCountFrequency (%)
' 7182
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1344192
90.9%
Common 134240
 
9.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 147568
 
11.0%
e 123619
 
9.2%
a 120670
 
9.0%
i 103788
 
7.7%
m 103339
 
7.7%
o 91622
 
6.8%
n 77189
 
5.7%
c 63492
 
4.7%
l 57613
 
4.3%
t 52156
 
3.9%
Other values (19) 403136
30.0%
Common
ValueCountFrequency (%)
- 127058
94.6%
' 7182
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1478432
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 147568
 
10.0%
- 127058
 
8.6%
e 123619
 
8.4%
a 120670
 
8.2%
i 103788
 
7.0%
m 103339
 
7.0%
o 91622
 
6.2%
n 77189
 
5.2%
c 63492
 
4.3%
l 57613
 
3.9%
Other values (21) 462474
31.3%

age
Real number (ℝ)

Distinct61
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.96985
Minimum7
Maximum73
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-11-06T22:05:40.726190image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile19
Q124
median30
Q340
95-th percentile55
Maximum73
Range66
Interquartile range (IQR)16

Descriptive statistics

Standard deviation11.562623
Coefficient of variation (CV)0.35070294
Kurtosis-0.1684152
Mean32.96985
Median Absolute Deviation (MAD)8
Skewness0.7331067
Sum3296985
Variance133.69426
MonotonicityNot monotonic
2023-11-06T22:05:40.782819image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27 6423
 
6.4%
24 4556
 
4.6%
20 4089
 
4.1%
25 4013
 
4.0%
22 3979
 
4.0%
30 3762
 
3.8%
29 3650
 
3.6%
28 3619
 
3.6%
32 3526
 
3.5%
19 3514
 
3.5%
Other values (51) 58869
58.9%
ValueCountFrequency (%)
7 43
 
< 0.1%
10 31
 
< 0.1%
11 27
 
< 0.1%
13 497
 
0.5%
14 264
 
0.3%
15 397
 
0.4%
16 335
 
0.3%
17 897
 
0.9%
18 2219
2.2%
19 3514
3.5%
ValueCountFrequency (%)
73 56
 
0.1%
70 141
0.1%
69 156
0.2%
68 92
 
0.1%
66 37
 
< 0.1%
65 229
0.2%
64 95
 
0.1%
63 77
 
0.1%
62 46
 
< 0.1%
61 282
0.3%

sex
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
M
74260 
F
25740 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters100000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
M 74260
74.3%
F 25740
 
25.7%

Length

2023-11-06T22:05:40.835641image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-06T22:05:40.873414image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
m 74260
74.3%
f 25740
 
25.7%

Most occurring characters

ValueCountFrequency (%)
M 74260
74.3%
F 25740
 
25.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 100000
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 74260
74.3%
F 25740
 
25.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 100000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 74260
74.3%
F 25740
 
25.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M 74260
74.3%
F 25740
 
25.7%

occupation
Categorical

HIGH CORRELATION 

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
student
21957 
other
10663 
educator
9442 
engineer
8175 
programmer
7801 
Other values (16)
41962 

Length

Max length13
Median length9
Mean length8.10458
Min length4

Characters and Unicode

Total characters810458
Distinct characters22
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowwriter
2nd rowwriter
3rd rowwriter
4th rowwriter
5th rowwriter

Common Values

ValueCountFrequency (%)
student 21957
22.0%
other 10663
10.7%
educator 9442
9.4%
engineer 8175
 
8.2%
programmer 7801
 
7.8%
administrator 7479
 
7.5%
writer 5536
 
5.5%
librarian 5273
 
5.3%
technician 3506
 
3.5%
executive 3403
 
3.4%
Other values (11) 16765
16.8%

Length

2023-11-06T22:05:40.916180image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
student 21957
22.0%
other 10663
10.7%
educator 9442
9.4%
engineer 8175
 
8.2%
programmer 7801
 
7.8%
administrator 7479
 
7.5%
writer 5536
 
5.5%
librarian 5273
 
5.3%
technician 3506
 
3.5%
executive 3403
 
3.4%
Other values (11) 16765
16.8%

Most occurring characters

ValueCountFrequency (%)
e 116458
14.4%
t 113342
14.0%
r 102818
12.7%
n 71022
8.8%
i 61708
7.6%
a 61570
7.6%
d 41027
 
5.1%
o 37665
 
4.6%
s 37572
 
4.6%
u 34802
 
4.3%
Other values (12) 132474
16.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 810458
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 116458
14.4%
t 113342
14.0%
r 102818
12.7%
n 71022
8.8%
i 61708
7.6%
a 61570
7.6%
d 41027
 
5.1%
o 37665
 
4.6%
s 37572
 
4.6%
u 34802
 
4.3%
Other values (12) 132474
16.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 810458
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 116458
14.4%
t 113342
14.0%
r 102818
12.7%
n 71022
8.8%
i 61708
7.6%
a 61570
7.6%
d 41027
 
5.1%
o 37665
 
4.6%
s 37572
 
4.6%
u 34802
 
4.3%
Other values (12) 132474
16.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 810458
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 116458
14.4%
t 113342
14.0%
r 102818
12.7%
n 71022
8.8%
i 61708
7.6%
a 61570
7.6%
d 41027
 
5.1%
o 37665
 
4.6%
s 37572
 
4.6%
u 34802
 
4.3%
Other values (12) 132474
16.3%
Distinct795
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
2023-11-06T22:05:41.080114image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters500000
Distinct characters26
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row55105
2nd row55105
3rd row55105
4th row55105
5th row55105
ValueCountFrequency (%)
55414 1103
 
1.1%
20009 878
 
0.9%
10019 850
 
0.9%
22902 832
 
0.8%
61820 817
 
0.8%
48103 746
 
0.7%
10003 736
 
0.7%
60657 685
 
0.7%
80525 678
 
0.7%
83702 639
 
0.6%
Other values (785) 92036
92.0%
2023-11-06T22:05:41.326075image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 84079
16.8%
1 66289
13.3%
2 58132
11.6%
5 53721
10.7%
4 45130
9.0%
3 43212
8.6%
9 38841
7.8%
7 35743
7.1%
6 35322
7.1%
8 33273
 
6.7%
Other values (16) 6258
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 493742
98.7%
Uppercase Letter 6258
 
1.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 1262
20.2%
A 812
13.0%
H 627
10.0%
V 612
9.8%
L 569
9.1%
E 534
8.5%
T 359
 
5.7%
P 316
 
5.0%
B 309
 
4.9%
R 214
 
3.4%
Other values (6) 644
10.3%
Decimal Number
ValueCountFrequency (%)
0 84079
17.0%
1 66289
13.4%
2 58132
11.8%
5 53721
10.9%
4 45130
9.1%
3 43212
8.8%
9 38841
7.9%
7 35743
7.2%
6 35322
7.2%
8 33273
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
Common 493742
98.7%
Latin 6258
 
1.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 1262
20.2%
A 812
13.0%
H 627
10.0%
V 612
9.8%
L 569
9.1%
E 534
8.5%
T 359
 
5.7%
P 316
 
5.0%
B 309
 
4.9%
R 214
 
3.4%
Other values (6) 644
10.3%
Common
ValueCountFrequency (%)
0 84079
17.0%
1 66289
13.4%
2 58132
11.8%
5 53721
10.9%
4 45130
9.1%
3 43212
8.8%
9 38841
7.9%
7 35743
7.2%
6 35322
7.2%
8 33273
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 500000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 84079
16.8%
1 66289
13.3%
2 58132
11.6%
5 53721
10.7%
4 45130
9.0%
3 43212
8.6%
9 38841
7.8%
7 35743
7.1%
6 35322
7.1%
8 33273
 
6.7%
Other values (16) 6258
 
1.3%

Interactions

2023-11-06T22:05:31.989687image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:10.604471image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:17.842983image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:27.713855image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:33.353949image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:12.790886image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:20.790986image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:29.238585image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:35.956089image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:16.056642image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:24.596338image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:31.867070image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:36.003445image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:17.028647image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:26.181409image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-06T22:05:31.917278image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Correlations

2023-11-06T22:05:41.406723image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
user_idmovie_idunix_timestampageratingunknownActionAdventureAnimationChildren'sComedyCrimeDocumentaryDramaFantasyFilm-NoirHorrorMusicalMysteryRomanceSci-FiThrillerWarWesterngenresexoccupation
user_id1.000-0.0070.038-0.0050.2780.0000.2270.1550.1390.1850.1760.0940.0610.2070.0000.0720.2070.1020.1190.1180.1420.1770.0830.0400.0850.9950.995
movie_id-0.0071.0000.0250.0100.2600.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.9920.1460.052
unix_timestamp0.0380.0251.0000.1220.0390.0230.0320.0280.0190.0230.0220.0070.0050.0440.0120.0160.0250.0050.0300.0140.0280.0180.0220.0000.0220.0840.211
age-0.0050.0100.1221.0000.0450.0000.0650.0390.0380.0460.0380.0190.0160.0800.0120.0340.0400.0210.0320.0230.0440.0400.0370.0090.0360.1250.361
rating0.2780.2600.0390.0451.0000.0030.0330.0200.0090.0450.0790.0280.0180.1150.0340.0470.0510.0060.0230.0400.0180.0220.0880.0150.0710.0450.081
unknown0.0000.9920.0230.0000.0031.0000.0040.0000.0000.0000.0040.0000.0000.0060.0000.0000.0000.0000.0000.0020.0000.0030.0000.0001.0000.0000.000
Action0.2270.9920.0320.0650.0330.0041.0000.4510.0990.1450.2230.0070.0510.2700.0130.0780.0070.0910.0330.0180.3240.2500.1670.0630.6720.0600.084
Adventure0.1550.9920.0280.0390.0200.0000.4511.0000.0240.1000.1130.0300.0350.2240.0880.0530.0590.0250.0430.0180.2950.0490.0870.0100.6970.0260.052
Animation0.1390.9920.0190.0380.0090.0000.0990.0241.0000.5550.0290.0570.0160.1570.0260.0250.0280.4180.0450.0850.0450.0770.0560.0260.6800.0080.036
Children's0.1850.9920.0230.0460.0450.0000.1450.1000.5551.0000.0830.0820.0240.1300.2380.0370.0660.3810.0550.1190.0420.1440.0850.0310.7610.0340.038
Comedy0.1760.9920.0220.0380.0790.0040.2230.1130.0290.0831.0000.0910.0570.3470.0170.0860.0740.0350.1110.0960.1460.2900.1200.0000.7560.0190.037
Crime0.0940.9920.0070.0190.0280.0000.0070.0300.0570.0820.0911.0000.0250.0640.0050.1640.0150.0670.0880.1020.0870.1240.0950.0400.6160.0230.017
Documentary0.0610.9920.0050.0160.0180.0000.0510.0350.0160.0240.0570.0251.0000.0580.0090.0110.0200.0190.0200.0430.0330.0460.0060.0110.9990.0000.031
Drama0.2070.9920.0440.0800.1150.0060.2700.2240.1570.1300.3470.0640.0581.0000.0200.0830.1590.0960.0690.0130.1740.1630.0990.0330.7420.0340.082
Fantasy0.0000.9920.0120.0120.0340.0000.0130.0880.0260.2380.0170.0050.0090.0201.0000.0150.0270.0260.0270.0170.1260.0470.0370.0150.6140.0000.010
Film-Noir0.0720.9920.0160.0340.0470.0000.0780.0530.0250.0370.0860.1640.0110.0830.0151.0000.0310.0300.2320.0550.0160.1100.0430.0180.6450.0100.029
Horror0.2070.9920.0250.0400.0510.0000.0070.0590.0280.0660.0740.0150.0200.1590.0270.0311.0000.0540.0000.0760.0340.0700.0760.0320.7180.0170.049
Musical0.1020.9920.0050.0210.0060.0000.0910.0250.4180.3810.0350.0670.0190.0960.0260.0300.0541.0000.0540.0100.0810.1110.0550.0310.6400.0170.021
Mystery0.1190.9920.0300.0320.0230.0000.0330.0430.0450.0550.1110.0880.0200.0690.0270.2320.0000.0541.0000.0600.0310.2300.0760.0320.5790.0020.029
Romance0.1180.9920.0140.0230.0400.0020.0180.0180.0850.1190.0960.1020.0430.0130.0170.0550.0760.0100.0601.0000.0630.1060.1270.0520.6380.0490.039
Sci-Fi0.1420.9920.0280.0440.0180.0000.3240.2950.0450.0420.1460.0870.0330.1740.1260.0160.0340.0810.0310.0631.0000.0470.1670.0520.6130.0440.055
Thriller0.1770.9920.0180.0400.0220.0030.2500.0490.0770.1440.2900.1240.0460.1630.0470.1100.0700.1110.2300.1060.0471.0000.1000.0730.7700.0300.047
War0.0830.9920.0220.0370.0880.0000.1670.0870.0560.0850.1200.0950.0060.0990.0370.0430.0760.0550.0760.1270.1670.1001.0000.0230.6360.0180.038
Western0.0400.9920.0000.0090.0150.0000.0630.0100.0260.0310.0000.0400.0110.0330.0150.0180.0320.0310.0320.0520.0520.0730.0231.0000.8530.0180.021
genre0.0850.9920.0220.0360.0711.0000.6720.6970.6800.7610.7560.6160.9990.7420.6140.6450.7180.6400.5790.6380.6130.7700.6360.8531.0000.0810.029
sex0.9950.1460.0840.1250.0450.0000.0600.0260.0080.0340.0190.0230.0000.0340.0000.0100.0170.0170.0020.0490.0440.0300.0180.0180.0811.0000.409
occupation0.9950.0520.2110.3610.0810.0000.0840.0520.0360.0380.0370.0170.0310.0820.0100.0290.0490.0210.0290.0390.0550.0470.0380.0210.0290.4091.000

Missing values

2023-11-06T22:05:36.156274image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-06T22:05:36.493922image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-11-06T22:05:36.941916image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

user_idmovie_idratingunix_timestamptitlerelease_datevideo_release_dateimdb_urlunknownActionAdventureAnimationChildren'sComedyCrimeDocumentaryDramaFantasyFilm-NoirHorrorMusicalMysteryRomanceSci-FiThrillerWarWesternyeargenreall_genresagesexoccupationzip_code
01952413.000881250949Kolya (1996)24-Jan-1997NaNhttp://us.imdb.com/M/title-exact?Kolya%20(1996)00000100000000000001997ComedyComedy49Mwriter55105
11952562.000881251577Men in Black (1997)04-Jul-1997NaNhttp://us.imdb.com/M/title-exact?Men+in+Black+(1997)01100100000000010001997ActionAction-Adventure-Comedy-Sci-Fi49Mwriter55105
21951104.000881251793Truth About Cats & Dogs, The (1996)26-Apr-1996NaNhttp://us.imdb.com/M/title-exact?Truth%20About%20Cats%20&%20Dogs,%20The%20(1996)00000100000000100001996ComedyComedy-Romance49Mwriter55105
3195244.000881251955Birdcage, The (1996)08-Mar-1996NaNhttp://us.imdb.com/M/title-exact?Birdcage,%20The%20(1996)00000100000000000001996ComedyComedy49Mwriter55105
41953814.000881251843Adventures of Priscilla, Queen of the Desert, The (1994)01-Jan-1994NaNhttp://us.imdb.com/M/title-exact?Adventures%20of%20Priscilla,%20Queen%20of%20the%20Desert,%20The%20(1994)00000100100000000001994ComedyComedy-Drama49Mwriter55105
51952013.000881251728Groundhog Day (1993)01-Jan-1993NaNhttp://us.imdb.com/M/title-exact?Groundhog%20Day%20(1993)00000100000000100001993RomanceComedy-Romance49Mwriter55105
61951525.000881251820Fish Called Wanda, A (1988)01-Jan-1988NaNhttp://us.imdb.com/M/title-exact?Fish%20Called%20Wanda,%20A%20(1988)00000100000000000001988ComedyComedy49Mwriter55105
71952855.000881250949English Patient, The (1996)15-Nov-1996NaNhttp://us.imdb.com/M/title-exact?English%20Patient,%20The%20(1996)00000000100000100101996DramaDrama-Romance-War49Mwriter55105
8195653.000881251911While You Were Sleeping (1995)01-Jan-1995NaNhttp://us.imdb.com/M/title-exact?While%20You%20Were%20Sleeping%20(1995)00000100000000100001995ComedyComedy-Romance49Mwriter55105
91958444.000881251954That Thing You Do! (1996)28-Sep-1996NaNhttp://us.imdb.com/M/title-exact?That%20Thing%20You%20Do!%20(1996)00000100000000000001996ComedyComedy49Mwriter55105
user_idmovie_idratingunix_timestamptitlerelease_datevideo_release_dateimdb_urlunknownActionAdventureAnimationChildren'sComedyCrimeDocumentaryDramaFantasyFilm-NoirHorrorMusicalMysteryRomanceSci-FiThrillerWarWesternyeargenreall_genresagesexoccupationzip_code
999908722882.000891392577Evita (1996)25-Dec-1996NaNhttp://us.imdb.com/M/title-exact?Evita%20(1996)00000000100010000001996DramaDrama-Musical48Fadministrator33763
999918722915.000891392177Rosewood (1997)21-Feb-1997NaNhttp://us.imdb.com/M/title-exact?Rosewood%20(1997)00000000100000000001997DramaDrama48Fadministrator33763
999928722682.000891392092Full Monty, The (1997)01-Jan-1997NaNhttp://us.imdb.com/M/title-exact?Full+Monty%2C+The+(1997)00000100000000000001997ComedyComedy48Fadministrator33763
999938728741.000891392577She's So Lovely (1997)22-Aug-1997NaNhttp://us.imdb.com/M/title-exact?She%27s+So+Lovely+(1997)00000000100000100001997RomanceDrama-Romance48Fadministrator33763
999948722994.000891392238Air Force One (1997)01-Jan-1997NaNhttp://us.imdb.com/M/title-exact?Air+Force+One+(1997)01000000000000001001997ThrillerAction-Thriller48Fadministrator33763
999958723125.000891392177Titanic (1997)01-Jan-1997NaNhttp://us.imdb.com/M/title-exact?imdb-title-12033801000000100000100001997DramaAction-Drama-Romance48Fadministrator33763
999968723254.000891392656G.I. Jane (1997)01-Jan-1997NaNhttp://us.imdb.com/M/title-exact?G%2EI%2E+Jane+(1997)01000000100000000101997WarAction-Drama-War48Fadministrator33763
999978723473.000891392577Desperate Measures (1998)30-Jan-1998NaNhttp://us.imdb.com/Title?Desperate+Measures+(1998)00000010100000001001998ThrillerCrime-Drama-Thriller48Fadministrator33763
999988723572.000891392698Spawn (1997)01-Aug-1997NaNhttp://us.imdb.com/M/title-exact?Spawn+(1997/I)01100000000000011001997AdventureAction-Adventure-Sci-Fi-Thriller48Fadministrator33763
999998723414.000891392698Man Who Knew Too Little, The (1997)01-Jan-1997NaNhttp://us.imdb.com/M/title-exact?Man+Who+Knew+Too+Little%2C+The+(1997)00000100000001000001997ComedyComedy-Mystery48Fadministrator33763